Skip to content

Conversation

@harryswift01
Copy link
Contributor

@harryswift01 harryswift01 commented Aug 14, 2025

Summary

This PR refactors and enhances the entropy computation to support averaging over timesteps, improve robustness against missing data, and streamline matrix operations. It also includes improvements to trajectory iteration, dihedral handling, and unit test coverage.

Changes

Implement Averaging Over Timesteps

  • Introduced running average logic for force and torque covariance matrices.
  • Updated build_covariance_matrices and update_force_torque_matrices to compute averages incrementally.
  • Removed post-loop summation and division, ensuring matrices are averaged during processing.

Improve Robustness in Entropy and Dihedral Computation

  • Refactored entropy computation to handle None and empty states safely.
  • Added type-safe checks for lists and NumPy arrays.
  • Skipped dihedral conformation assignment when no dihedrals are present.
  • Prevented unnecessary allocation and ensured entropy is only computed when valid states exist.

Refactor Trajectory Iteration

  • Replaced direct iteration with zip(index, frame) for consistent indexing.
  • Ensured final frame is not skipped during iteration.

Eigenvalue Filtering Enhancements

  • Filtered out complex and non-positive eigenvalues.
  • Converted valid values to real numbers and logged discarded ones.
  • Optimised filtering with a single-pass mask and summary warning.

Unit Test Improvements

  • Updated unit tests to reflect refactored logic.
  • Tidied up comments for clarity and maintainability.

Default Grouping Configuration

  • Changed default grouping from each to molecules for more intuitive behavior.

Impact

  • Improves numerical stability and correctness of entropy calculations.
  • Reduces memory usage by avoiding full-frame storage.
  • Enhances code readability and maintainability.
  • Provides more intuitive default behavior for users.

- Replaced per-frame overwrite and post-loop sum/division with incremental running average updates, matching the approach used in `waterEntropy`.
- Added `frame_counts` tracking to compute averages on the fly without storing all frame data.
- Updated `build_covariance_matrices` and `update_force_torque_matrices` to initialize averages on first frame and update them in place each step.
- Ensures final returned matrices are already averaged and representative of all processed timesteps.
- Removed division by `n_frames` in `entropy.py` to retain running averages
- Replaced `+=` with `=` in `get_matrices` to avoid accumulation across frames
- Simplified force and torque matrix updates in `levels.py` for consistency
…ues:

- Added logging for complex and non-positive eigenvalues with index and value
- Applied mask to filter out invalid values using np.isreal, np.isclose, and positivity check
- Converted filtered values to pure real numbers using .real
- Replaced per-element logging with a single summary warning based on mask length
- Improved performance and readability by avoiding redundant loops
- Replaced direct iteration over trajectory slices with `zip(index, frame)`
- Ensures consistent indexing and avoids skipping final frame
… for `assign_conformation`:

- Replaced direct iteration over trajectory slices with `zip(index, frame)`
- Ensures consistent indexing and avoids skipping final frame
… dihedrals are present:

- Added check to avoid calling `assign_conformation` if `get_dihedrals` returns an empty list
- Prevented unnecessary allocation of empty `conformation` arrays
- Ensured `states` are only generated when valid dihedrals exist
- `conformational_entropy_calculation` is only called if the `state` is not empty
…afely:

- Replaced ambiguous truth checks with type-safe logic for lists and NumPy arrays
- Added guard against NoneType access for `states[group_id]` to prevent iteration errors
@harryswift01 harryswift01 added this to the 1.0.0 release milestone Aug 14, 2025
@harryswift01 harryswift01 self-assigned this Aug 14, 2025
@harryswift01 harryswift01 linked an issue Aug 14, 2025 that may be closed by this pull request
3 tasks
@harryswift01 harryswift01 added the feature request New feature or request label Aug 14, 2025
Copy link
Member

@jimboid jimboid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has had a lot of eyeballing from the whole team whilst it has been being developed. Test data suggests the code changes here are good. A very hard fought addition from the team here, well done all!

Copy link
Collaborator

@jkalayan jkalayan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look great, looks like we have a working version of CodeEntropy :)

@skfegan
Copy link
Member

skfegan commented Aug 15, 2025

In entropy.py._process_conformational_entropy line 432 is:

group_states = states[group_id] if group_id < len(states) else None

I am not sure this is correct. In the grouping_molecules function (in group_molecules.py), the groups are dictionaries with the keys as the molecule number of the first molecule of that type. This means that the group_ids are all integers, but may not be consecutive. For grouping == "each" or most of our current examples it probably works, but if there are systems with multiples of molecules depending on how they are ordered in the topology weird things might happen if you try using the group_id as a counter rather than treating it as a label.

@harryswift01
Copy link
Contributor Author

In entropy.py._process_conformational_entropy line 432 is:

group_states = states[group_id] if group_id < len(states) else None

I am not sure this is correct. In the grouping_molecules function (in group_molecules.py), the groups are dictionaries with the keys as the molecule number of the first molecule of that type. This means that the group_ids are all integers, but may not be consecutive. For grouping == "each" or most of our current examples it probably works, but if there are systems with multiples of molecules depending on how they are ordered in the topology weird things might happen if you try using the group_id as a counter rather than treating it as a label.

This was an issue relating the the -1 input of end, we have now ensured that if the input of -1 is used we correctly assign end as the length of the Universe trajectory, this will resolve the indexing issues we have previously highlighted.

@harryswift01 harryswift01 merged commit bc10e47 into main Aug 15, 2025
7 checks passed
@harryswift01 harryswift01 deleted the 127-add-averaging-over-timesteps branch August 15, 2025 10:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add averaging of force and torque matrices over timesteps

5 participants